The RedHot analysis was completed to compare the health of western redcedar across urban heat islands. Redcedar health data were collected by community scientists in the Western Redcedar Dieback Map project on iNaturalist. Urban heat data were commissioned by three cities in the northwest using methods described in Voelkel and Shandas (2017).
Observations of western redcedar were downloaded from iNaturalist and urban heat data were downloaded from open data portals or provided by contacts in the City of Tacoma, King County (Washington) and Portland. Trees in WA were also evaluated based on EHD Ranks. HOLC data were also investigated for each city.
Note temperature data is different for each dataset and may have been collected slightly differently. Temperature data will need to be standardized (difference from mean) for each dataset, then temperatures can be compared region wide.
Environmental Health Disparity Rank Data (Full EHD Rank data (v2)) was downloaded on 6.17.23 from https://geo.wa.gov/datasets/WADOH::full-environmental-health-disparities-version-2-extract/explore
Metadata is available in this technical report: https://doh.wa.gov/sites/default/files/2022-07/311-011-EHD-Map-Tech-Report_0.pdf?uid=634dcf4aec2b5
City tree data were ‘joined by attribute’ separately so we have 3 different tree datasets to work with or merge.
Also, given the UHI data were extracted with shapefiles, the column names are limited to 10characters. Therefore, exported UHI data only include iNat ID numbers and UHI data. These were then re-merged (see below) with the iNat data to get the remaining columns with proper names.
Filter data to only include necessary columns
[1] “id”
[23] “latitude”
[24] “longitude” [35] “place_town_name”
[36] “place_county_name”
Export data for QGIS data joins
Tacoma - 357 Trees Portland - 465 Trees King County - 516 Trees
Export final .shp files as .csv
Re-import Data after following QGIS Methods
Note we needed to convert Tacoma temps to F to match king county
Some of the iNat project questions changed since it was created so some we need to adjust the answers to be more consistent throughout the project.
## Warning: Removed 36 rows containing missing values (`geom_point()`).
Should we remove these four outliers with less than -8 dist from mean af?
## Warning in geom_histogram(stat = "count"): Ignoring unknown parameters:
## `binwidth`, `bins`, and `pad`
## Warning in geom_histogram(stat = "count"): Ignoring unknown parameters:
## `binwidth`, `bins`, and `pad`
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The data is likely zero inflated
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
There are some data that need to be cleaned, healthy trees should not have dieback percent of 100.
binomial distributions
Ordinal regression, maybe poisson distribution?
Some cleaning is needed, some healthy observations have dieback and some dead trees have 0% dieback.
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
Note we need to remove some outliers
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
Binary Response (2 categories)
## # A tibble: 2 × 2
## # Groups: binary.tree.canopy.symptoms [2]
## binary.tree.canopy.symptoms n
## <fct> <int>
## 1 Healthy 815
## 2 Unhealthy 424
Filtered response (5 categories)
## # A tibble: 4 × 2
## # Groups: reclassified.tree.canopy.symptoms [4]
## reclassified.tree.canopy.symptoms n
## <fct> <int>
## 1 Healthy 815
## 2 Thinning Canopy 165
## 3 Dead Top 111
## 4 Other 148
## # A tibble: 146 × 2
## # Groups: user_login [146]
## user_login n
## <chr> <int>
## 1 abbielisabeth 4
## 2 abbigail_white 5
## 3 abe 1
## 4 adrienne_stclair 2
## 5 akreiner 6
## 6 alexis_mushroom 57
## 7 alginger 4
## 8 amy_boucher 1
## 9 and_allies 3
## 10 angela_mabel 50
## # ℹ 136 more rows
## n.participants n.obs
## 1 146 1239
## n.participants n.obs
## 1 96 785
## # A tibble: 3 × 1
## Area
## <chr>
## 1 King County
## 2 Portland
## 3 Tacoma
Does tree dieback (proportion/percent dieback) increase with increases in urban heat?
Because the response variable is percent (we can consider it as proportion too) we can do regression using a beta distribution (comparing shapes) rather than a linear regression. We also need to use an distribution zero or one inflated rate.
The original model threw the following error because we had dead trees with proportions of 1.0.
zibeta.daily <- glmmTMB(field.dieback.prop~DN_AF1 + (1|Area),data=data,ziformula=~1,family=beta_family()) Error in eval(family$initialize) : y values must be 0 <= y < 1
One way around is to change 1s to .99 as suggested by Ben Bolker here. However, another option, as noted in the GLMM FAQ (Beta GLMMs section) is to use the brms package because it can handle zero-one inflated models.
Another, possibly more biologically relevant, option is to remove the dead trees from the analysis. This may make the most sense becuase we’re not sure if why they died.
Dead trees were excluded from below models
## Family: beta ( logit )
## Formula: field.dieback.prop ~ dist.from.mean.daily + (1 | Area)
## Zero inflation: ~dist.from.mean.daily
## Data: data
##
## AIC BIC logLik deviance df.resid
## 1130.8 1161.5 -559.4 1118.8 1233
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## Area (Intercept) 0.02493 0.1579
## Number of obs: 1239, groups: Area, 3
##
## Dispersion parameter for beta family (): 2.32
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.20299 0.11043 -10.894 <2e-16 ***
## dist.from.mean.daily 0.05940 0.03501 1.697 0.0898 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Zero-inflation model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.57832 0.05956 9.71 < 2e-16 ***
## dist.from.mean.daily -0.13001 0.03988 -3.26 0.00111 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In general, we know temperature is relevant for predicting percent dieback.
## dAIC df
## zibeta.af 0.0 6
## zibeta.daily 16.2 6
## zibeta.pm 24.9 6
## zibeta.am 29.2 6
Lowest score is most preferred model. Difference in AIC scores can be intrepreted as ‘extra information lost’ by using the worse model in comparison to the better model.
The model with AF temperatures was the best fit.
## Family: beta ( logit )
## Formula: field.dieback.prop ~ dist.from.mean.af + (1 | Area)
## Zero inflation: ~dist.from.mean.af
## Data: data
##
## AIC BIC logLik deviance df.resid
## 1114.6 1145.3 -551.3 1102.6 1233
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## Area (Intercept) 0.03499 0.187
## Number of obs: 1239, groups: Area, 3
##
## Dispersion parameter for beta family (): 2.34
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.22920 0.12552 -9.793 <2e-16 ***
## dist.from.mean.af 0.06471 0.02567 2.521 0.0117 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Zero-inflation model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.57931 0.05986 9.677 < 2e-16 ***
## dist.from.mean.af -0.11940 0.02504 -4.769 1.85e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The conditional output is indicating that for all observations with greater than 0 dieback proportions, distance from mean am influences the proprtion of dieback. The zi model tells us which predictor increases the propability of non-zero. > interesting that the distance from mean am has a negative estimate though - hard to know if it is observations higher than the mean or lower than the mean (ie as temp increases, distance from mean decreases (but higher or lower than mean))
Including a predictor in the zi= bit of the model would test whether heat influences whether there is dieback at all.
Possible understanding from: https://stats.stackexchange.com/questions/466047/interpreting-output-for-glmmtmb-for-zero-inflated-count-data
Answers reference ‘pages 382-383 explain all components of the model summary:’
Brooks, M. E., Kristensen, K., van Benthem, K. J., Magnusson, A., Berg, C. W., Nielsen, A., Skaug, H. J., Machler, M., & Bolker, B. M. (2017). glmmTMB balances speed and flexibility among packages for Zero-inflated Generalized Linear Mixed Modeling. The R Journal, 9(2), 378-400. https://doi.org/10.32614/RJ-2017-066
##
## DHARMa zero-inflation test via comparison to expected zeros with
## simulation under H0 = fitted model
##
## data: simulationOutput
## ratioObsSim = 0.99986, p-value = 1
## alternative hypothesis: two.sided
## Family: beta ( logit )
## Formula:
## field.dieback.prop ~ dist.from.mean.af + Environm_1 + (1 | Area)
## Zero inflation: ~dist.from.mean.af + Environm_1
## Data: wa.data
##
## AIC BIC logLik deviance df.resid
## 754.3 791.6 -369.1 738.3 779
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## Area (Intercept) 0.019 0.1378
## Number of obs: 787, groups: Area, 2
##
## Dispersion parameter for beta family (): 2.34
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.22708 0.23968 -5.120 3.06e-07 ***
## dist.from.mean.af 0.11767 0.04434 2.654 0.00795 **
## Environm_1 0.01394 0.02707 0.515 0.60644
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Zero-inflation model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.23445 0.25794 0.909 0.363
## dist.from.mean.af -0.06615 0.05451 -1.214 0.225
## Environm_1 0.00023 0.03302 0.007 0.994
## Family: beta ( logit )
## Formula: field.dieback.prop ~ dist.from.mean.af + Poverty + (1 | Area)
## Zero inflation: ~dist.from.mean.af + Poverty
## Data: wa.data
##
## AIC BIC logLik deviance df.resid
## 753.2 790.5 -368.6 737.2 779
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## Area (Intercept) 0.02134 0.1461
## Number of obs: 787, groups: Area, 2
##
## Dispersion parameter for beta family (): 2.35
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.216093 0.165280 -7.358 1.87e-13 ***
## dist.from.mean.af 0.108746 0.045386 2.396 0.0166 *
## Poverty 0.004307 0.004919 0.876 0.3812
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Zero-inflation model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.135740 0.144838 0.937 0.349
## dist.from.mean.af -0.075528 0.055810 -1.353 0.176
## Poverty 0.004635 0.005813 0.797 0.425
## Family: beta ( logit )
## Formula: field.dieback.prop ~ dist.from.mean.af + grade + (1 | Area)
## Zero inflation: ~dist.from.mean.af + Poverty
## Data: holc.data
##
## AIC BIC logLik deviance df.resid
## 258.2 295.4 -119.1 238.2 295
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## Area (Intercept) 2.507e-10 1.583e-05
## Number of obs: 305, groups: Area, 2
##
## Dispersion parameter for beta family (): 2.57
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.30573 0.30234 -4.319 1.57e-05 ***
## dist.from.mean.af 0.12976 0.07555 1.718 0.0859 .
## gradeB -0.23018 0.34161 -0.674 0.5004
## gradeC 0.24652 0.32804 0.751 0.4524
## gradeD -0.18088 0.41120 -0.440 0.6600
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Zero-inflation model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.010073 0.264888 0.038 0.970
## dist.from.mean.af -0.149360 0.091695 -1.629 0.103
## Poverty 0.014059 0.009987 1.408 0.159
Which temperatures (morning, afternoon, eve) are most important for determining tree health?
Does dieback increase with increases in pollution?
Which pollution data are important for determining tree health?